我们提出了一种基于深度多实例学习的简单高效的图像分类架构,并将其应用于牙科射线照片中龋齿检测的具有挑战性的任务。从技术上讲,我们的方法有两种方式贡献:首先,尽管使用弱图像级标签培训,它尽管培训了本地补丁分类概率的热线图。其次,它可以从分段标签学习,从而指导培训。与现有方法相比,人类用户可以忠实地解释预测并与模型进行交互以决定参加哪些区域。实验是在$ \ SIM $ 38K Bitewings($ \ SIM $ 316K牙齿)的大型临床数据集上进行的,在那里我们与各种基线相比实现了竞争性能。当由外部龋齿分割模型引导时,观察到分类和定位性能的显着改善。
translated by 谷歌翻译
Large-scale models combining text and images have made incredible progress in recent years. However, they can still fail at tasks requiring compositional knowledge, such as correctly picking out a red cube from a picture of multiple shapes. We examine the ability of CLIP (Radford et al., 2021), to caption images requiring compositional knowledge. We implement five compositional language models to probe the kinds of structure that CLIP may be using, and develop a novel training algorithm, Compositional Skipgram for Images (CoSI), to train these models. We look at performance in attribute-based tasks, requiring the identification of a particular combination of attribute and object (such as "red cube"), and in relational settings, where the spatial relation between two shapes (such as "cube behind sphere") must be identified. We find that in some conditions, CLIP is able to learn attribute-object labellings, and to generalize to unseen attribute-object combinations. However, we also see evidence that CLIP is not able to bind features together reliably. Moreover, CLIP is not able to reliably learn relations between objects, whereas some compositional models are able to learn these perfectly. Of the five models we developed, none were able to generalize to unseen relations.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
本文根据推荐系统社区中当前的关注来研究用户属性:多样性,覆盖范围,校准和数据最小化。在利用侧面信息的传统上下文感知的推荐系统的实验中,我们表明用户属性并不总是改善建议。然后,我们证明用户属性可能会对多样性和覆盖率产生负面影响。最后,我们调查了从培训数据中``生存''到推荐人产生的建议列表中的有关用户的信息量。该信息是一个薄弱的信号,将来可能会被利用进行校准或作为隐私泄漏进一步研究。
translated by 谷歌翻译
3D形状比2D图像提供了更多的信息。但是,与获取2D图像相比,有时会非常困难甚至不可能,因此有必要从2D图像中得出3D形状。尽管通常这是数学上不适的问题,但可以通过使用先验信息来限制问题公式来解决。在这里,我们提出了一种基于肯德尔的形状空间的新方法,可从单眼2D图像重建3D形状。这项工作是由研究Basking Shark的喂养行为的应用,这是一种濒临灭绝的物种,其巨大的大小和迁移率使3D形状数据几乎无法获得,从而阻碍了对其喂养行为和生态学的了解。但是,这些动物处于进食位置的2D图像很容易获得。我们将方法与基于最先进的形状的方法进行了比较,无论是在人棒模型还是在鲨鱼头骨架上。我们使用一系列的训练形状表明,Kendall Shape空间方法比以前的方法更强大,并导致形状合理的形状。这对于标本很少见的激励应用至关重要,因此只有很少的训练形状可用。
translated by 谷歌翻译
我们将人机协作问题解决的问题视为一项计划任务,再加上自然语言交流。我们的框架由三个组成部分组成 - 一种自然语言引擎,将语言话语解析为正式代表,反之亦然,这是一个概念学习者,该概念学习者基于与用户的有限互动来诱导计划的广义概念,以及解决方案的HTN规划师,以解决该计划。基于人类互动的任务。我们说明了该框架通过在基于Minecraft的Blocksworld域中的协作构建任务中证明协作问题解决的关键挑战的能力。随附的演示视频可在https://youtu.be/q1pwe4aahf0上获得。
translated by 谷歌翻译
自动识别系统(AIS)消息对于使用无线电链路和卫星收发器在全球范围内跨海的血管活动很有用。这样的数据在跟踪血管活性和映射迁移率模式(例如捕鱼中发现)中起着重要作用。因此,本文提出了一种几何驱动的半监督方法,用于从AIS数据中检测捕捞活动。通过提出的方法,我们展示了如何探索消息中包含的信息,以提取描述船舶路线几何形状的特征。为此,我们利用了聚类分析的无监督性质来标记轨迹几何形状,突出了往往表明捕鱼活动的容器运动模式的变化。建议的无监督方法获得的标签用于检测捕鱼活动,我们将其作为时间序列分类任务进行。在这种情况下,我们在AIS数据流上使用复发性神经网络提出了一个解决方案,该解决方案大约是50种不同看不见的渔船的整个轨迹的总$ F $分数的87%。此类结果伴随着广泛的基准研究,该研究评估了不同复发性神经网络(RNN)体系结构的性能。总之,这项工作通过提出一个详尽的过程来做出贡献,其中包括数据准备,标签,数据建模和模型验证。因此,我们提出了一种新颖的解决方案,用于迁移模式检测,该解决方案依赖于时间上展开轨迹并观察其固有的几何形状。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
本文研究了一种使用背景计划的新方法,用于基于模型的增强学习:混合(近似)动态编程更新和无模型更新,类似于DYNA体系结构。通过学习模型的背景计划通常比无模型替代方案(例如Double DQN)差,尽管前者使用了更多的内存和计算。基本问题是,学到的模型可能是不准确的,并且经常会产生无效的状态,尤其是在迭代许多步骤时。在本文中,我们通过将背景规划限制为一组(抽象)子目标并仅学习本地,子观念模型来避免这种限制。这种目标空间计划(GSP)方法更有效地是在计算上,自然地纳入了时间抽象,以进行更快的长胜压计划,并避免完全学习过渡动态。我们表明,在各种情况下,我们的GSP算法比双DQN基线要快得多。
translated by 谷歌翻译
Adversarial images are created with the intention of causing an image classifier to produce a misclassification. In this paper, we propose that adversarial images should be evaluated based on semantic mismatch, rather than label mismatch, as used in current work. In other words, we propose that an image of a "mug" would be considered adversarial if classified as "turnip", but not as "cup", as current systems would assume. Our novel idea of taking semantic misclassification into account in the evaluation of adversarial images offers two benefits. First, it is a more realistic conceptualization of what makes an image adversarial, which is important in order to fully understand the implications of adversarial images for security and privacy. Second, it makes it possible to evaluate the transferability of adversarial images to a real-world classifier, without requiring the classifier's label set to have been available during the creation of the images. The paper carries out an evaluation of a transfer attack on a real-world image classifier that is made possible by our semantic misclassification approach. The attack reveals patterns in the semantics of adversarial misclassifications that could not be investigated using conventional label mismatch.
translated by 谷歌翻译